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ABSTRACT 

The strengths of item-response theory (IRT) are used 
to examine the degree of information individual test items provide, 
as well as to investigate how the individual item types contribute to 
the overall measurement accuracy of the Illinois Goal Assessment 
Program (IGAP) reading test. Using the graded-response model of 
Samejina (1969), the amount of information each subtest (narrative 
.and expository) provides about the underlying latent ability is 
studied. Where an item type provides the most information along this 
ability scale, and how the different item formats (e.g., number of 
correct inferences) differ in terms of ability to discriminate 
between levels of reading proficiency are also studied. Data sets of 
4,837, 4,840, and 5,011 randomly selected examinees were obtained for 
grades 3, 6, and 8, respectively. While the expository subtest is 
generally more informative than the narrative subtest across the 
three grade levels for low to moderate theta values, the difference 
does not appear to be substantial. The graded response model appears 
to be a promising tool that allows examination of the information 
from each subtest. Fourteen figures illustrate the findings. 
(Contains 10 references. (SLD) 
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Abstract 



The purpose of this inquiry is to utilize the strengths of Item 
Response Theory (IRT), to examine the degree of information individual 
test items provide, as well as to investigate how the individual item types 
contribute to the overa;' measurement accuracy of the IGAP reading test. 
Using Samejima's (1969) graded response model, this paper compares 
the amount of information each subtest, narrative and expository, 
provides about the underlying latent ability; where along this ability scale 
an item type provides the most information; and how the different item 
formats (e.g., number of correct inferences) differ in terms of their ability 
to discriminate between levels of reading proficiency. 
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In contrast to more traditional reading assessments that use 
isolated paragraphs and fragmented text, the passages used in the 
Illinois Goai Assessment Program (IGAP) are intact pieces of literature, 
stories, and essays that match classroom reading assignments and 
typical student reading experiences. The items associated with each 
passage require students to demonstrate various levels of cognitive 
skills, from explicit response to drawing conclusions that are not directly 
stated, solving problems not discussed within the text, and using 
information derived from the reading passage. Because texts often 
support more than one correct inference, the Illinois reading assessment 
uses a multiple response (or multiple correct) rather than a multiple 
choice format. There may be more than one correct conclusion or 
inference to each item with credit awarded each time a correct or 
incorrect inference is selected. 

"IGAP defines reading as a dynamic process by which readers 
combine background knowledge, reading ability, strategic awareness, 
and information from the text to construct meaning." 1 The format to 
assess an examinee's ability to construct meaning presents the 
examinee with two passages, one narrative (story type) and one 
expository (informational type) with questions accompanying each 
passage. The test is administered in two 40-minute sessions, one type of 
passage per session, with a rest period between sessions. 

The item structure utilizes a multiple correct inference format 
where each item may have one, two, or three correct inferences. Thus, 
examinees have to identify which conclusions are both correct and 
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incorrect. Each sub-test (narrative and expository) has 15 items. Each 
item is constructed using one or more of the five questioning types: 
Explicit, Inference Level I, Inference Level IS, Application Transfer, or 
Vocabulary. Third grade examinees respond directly on the test booklet, 
while sixth and eighth grade examinees mark a separate answer sheet. 

Results from the IGAP reading examination are used to identify 
whether an examinee fails to meet, meets, or exceeds a predetermined 
standards. Two cut scores along the underlying ability scale differentiate 
between fails to meat and meets the standard, and meets and exceeds 
the standard. 

This examination structure would appear to be rich in information 
necessary to make critical attainment decisions. One of the strengths of 
Item Response Theory (IRT) modeling, when all of the underlying 
assumptions of the model are met, is that practitioners can gain a great 
deal of information about individual test items and how they contribute to 
the overall measurement accuracy of the test. 

A review of reading assessment literature produced no study that 
applied a graded response model to analyze reading test results. 
However, within the area of language arts, a partial credit model has 
been utilized to analyze narrative writing tasks in order to identify aspects 
of writing that function differently (Harris, Laan, and Mossen, 1988). 
Additionally, from an IRT perspective, Ackerman (1986) used a graded 
model to compare holistically scored essays with multiple choice writing 
tests in an attempt to see which is more informative, and at which abilities 
the most information is provided. 
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Although the partial credit model requires a continuum, the graded 
model requires only an ordering. Items within the IGAP reading test are 
not structured to be interpreted on a continuum, but do allow ordering. 

Graded Response 

Prior to the 1993 IGAP reading examination, test results had been 
analyzed and equated by procedures that are rooted in classical test 
theory and limited to free response items that had been dichotomously 
scored. Unlike such binary items, the item variable scale in the graded 
scoring procedure is divided into ordered categories. As such, the lowest 
category contributes least to a person's test score while the highest 
category would contribute most. 

It is an underlying assumption that an examinee's response to an 
item scored on a graded basis possesses a hypothetical continuous item 
variable ranging from -<*> to +«> and has been divided into m response 

categories for a given item. The response categories are ordered, with k 
denoting an arbitrary category, fc«0,1 , 2,...m/ ( where mj , is the number of 
response categories for item / (Baker, 1992). 

The probability of an examinee responding to category k or higher 
can be denoted as: 

p m 1 1 

(XMk) " 1 +exp[-a(8-b^)] 1 +exp[-a(8-b A )] 

where bfcis the difficulty level from category k (Hambleton, Swaminathon, 
& Rogers, 1991). 
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The IGAP reading examination provides six categories, /r-0,1 ,...5. 
The assumption of equal discrimination parameters for M>,1 ,...5, that is, 
the homogeneous case of the graded response model, yields *»o and 

as monotonic and **1 ,2,3,4 as generally non-monotonic (Hambleton 
& Swaminathon, 1985). When plotted in concort (Figuro 1), the 
interrelationship among the six response categories is more easily 
recognized. 



Insert Figure 1 about here 



At the higher e levels, category /c-5 has the highest probability of 
occurring; at the lowest 8 levels category jm> has the highest probability 
of occurring. For the middle ability levels, categories fr-1 ,2,3,4, the 
probability values are most probable. 

At each grade level measured, the IGAP reading examination has 
1 5 items with a 6-point scale (0 ( 1 , 2, 3, 4, and 5) for both the narrative 
and expository passages,. For this analysis (15 x 5) ♦ 15 « 90 item 
parameter values would be estimated for both sub-tests (narrative and 
expository) , using the graded response model in MULTILOG (Thissen, 
1991). The program employs MMLE to obtain item parameter estimates. 

ftwn parameters estimated to fit the graded response model were 
subsequently used to compute the item information functions. The 
amount of information yielded by an item at ability level 6 in the 
polytomous case, can be expressed as: 



1/(8)- ZMB)P(S) 
k*t * 
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where the quantity l/tf6) P/tf6) is the amount of information share of 
category k (Baker, 1991 , p.240). 

The amount of item information yielded by polytomous scoring 
when compared to dichotomous scoring will result in an increase in the 
amount of item information (Samejima, 1969, p.40). As such, the graded 
response case would be expected to produce a smaller standard error 
for the estimate of the examinee's latent ability than the dichotomous 
case (Baker, 1991, p. 244). 

Comparing information functions for the two components 
(narrative and expository) of the IGAP reading examination that are 
measuring the same ability, e (Bolt & Ackerman, 1994) can be written 
as: 



where re(9) is the relative efficiency and i N (8) and i E (6) are the 
information functions for the narrative subtest and the expository subtest, 
defined over a common ability scale (6) (Hambleton, et al., 1991 , p.96). 



Method 



Data sets of 4837, 4840, and 501 1 randomly selected examinees 
were obtained for grades three, six, and eight, respectively. The data set 
consisted of response patterns for the fifteen testlets within both the 
narrative and expository subtests; a total of thirty items. Each of the thirty 
items was scored polytomously with k categories, fc«0,2,...5. Item 
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parameters were estimated using MULTILOG (Thissen, 1991). 
FORTRAN programs were constructed to produce item and test 
information functions, as well as relative efficiency comparisons between 
the narrative and expository subtests at each grade level. 

BasiM 

Comparing sub-tests 

Figures 2-4 plots contain the relative efficiency of the narrative and 
expository subtests of grades three, six and eighth respectively. Figure 2 
illustrates that the expository subtest provides more information for ability 
levels less than a 9 value of +2.00. The narrative subtest provides 
greater information for those examinees with a 6 value greater than 
+2.00. But, in both cases the extent of "more" information is less than ten 
percent. At grade six, the amount of information provided by a specific 
sub-test varies across the ability scale. For those examinees with 
extremely low 9 values (less than -2.75) and moderately high -9 vaJues 
(greater than +1.6) The narrative subtest provides more information. For 
those sixth grade examinees with 9 values between -1 .7 and +1 .6, the 
expository subtest is more informative. Again, as was the case in grade 
three, the extent of greater information is less than ten percent. 



Insert Figures 2-4 about here 



The amount of information provided by each sub-test across the common 
ability scale is easily observed in Figure 3. Figure 4 illustrates that the 
Expository Sub-test is increasingly more informative as one moves up 
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the ability scale, providing greater than ten percent more information for 
those examinees with estimated 9 values greater than +1.5. 

As a practical matter, the test constructor(s) can observe the 
relative efficiency of one sub-test compared with another throughout the 
9 range. For example, at 9 value 0.5 the eighth grade expository subtest 
is providing approximately 7% more information than the eighth grade 
narrative subtest. That is, to have an equal amount of information 
provided by each sub-test, the test constructor(s) would need to increase 
the number of Narrative items by 1 . 
Comparing questioning types. 

Each five part testlet can be denoted by the five questioning 
formats, Explicit, Inferential I, Inferential II, Application Transfer, and 
Vocabulary. The testlet may consist of a single item type across ail five 
questions or may consist of multiple item types across all five parts. For 
this analysis, item types across testlets was selected. That is, each item 
type (Explicit, Inferential I, Inferential II, Application Transfer, .and , - 
Vocabulary) is represented within the analysis. 

Figures 5-7 represents the item information function by number of 
correct inferences. Regardless of question type, more information is 
provided for examinees of estimated lower ability. In general, the amount 
of information declines for 6 values of 1 .00 regardless of grade. It should 
be noted that the grade eight exam possesses only four questioning 
types, as one of the questioning types is not included in the examination. 



Insert Figures 5-7 about here 
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Comparing ite m bv number of comet inferences 

To compare items of different number of correct inferences (i.e., 
one, two, or three), their information functions are examined. The 
findings are similar to those seen when information functions for item 
types are compared. Regardless of grade level, items with onr, two, or 
three correct responses provide the most information for examinees of 
with estimated 8 values less than 0 (Figures 8-10). 



Insert Figures 8-10 about here 



Figures 1 1 graphically represent the mean item information 
function by number of correct inferences. The mean item information 
function accounts for the differing number of items with one, two, or three 
correct, responses. The testlet information function is the sum of the 
item information functions. Without compensating for the number of items 
within a testlet, one may assume a given item type is more nformative 
because its sum is larger when , in fact, its sum is larger because there 
are more items in the testlet. 

For grades three, six, and eight the one correct response items 
provide more information than the two correct response items, and both 
provide more information than the three correct response items. 



Insert Figures 1 1 about here 
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As expected, the mean item information function by number correct item 
type is representative of the individual items, providing more information 
for low to moderate 9 values. 

Discussion 

While the expository subtest is generally more informative than the 
narrative subtest across the three grade levels for low to moderate 0 
values the difference does not appear to be substantial. If one were to try 
to equalize the amount of information for both sub-tests, one would need 
to weigh the cost of increasing the number of expository items (or 
decreasing the number of narrative items) and the subsequent effect of 
altered test length on the administrative time of the exam. 

Given that one of the purposes of this exam is to identify 
examinees that fail to meet, meet, and exceed a predetermined 
standard, one would expect the items to be providing a significant 
amount of information at the two cut scores. IGAP reading exam items 
appear to provide the most information at the low ability levels, and not at 
either of the two cut scores (Figure 1 2). Few items are very informative 
at the moderate and high ability levels. 



Insert Figure 1 2 about here 



IRT analysis provides the IGAP reading test constructors with the 
opportunity to reconstruct or add items to provide information about 
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examinees at moderate and higher e values while maintaining a balance 
between the Narrative and Expository subtests information functions. 

The format of the IGAP reading examination represents an 
alternative to traditional reading assessment instruments, and the 
opportunity to provide educators with greater information regarding their 
students' reading performance. The graded response model (Samejima, 
1969) appears to be a promising tool that allows test constructors an 
opportunity to investigate and compare the amount of information for 
each subtest (narrative and expository) and item (type and number 
correct) structure. 
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Figure Captions 

Figure 1 . Equal discrimination parameters. 

Ofliii&JL Third grade narrative vs. expository relative efficiency. 

fiflUia^ Sixth grade narrative vs. expository relative efficiency. 

mULJL Eighth grade narrative vs. expository relative efficiency. 

Emi&A Third grade item information function by question type. 

F ' flurft 6 1 Sixth 9 rade *™ information function by question type. 
fiflU£fi_L Eighth grade item information function by question type. 

fiflU£SLiL responses" information functi ° n by number of correct 
respond? ^ information function by number of correct 

EiM ^ L response^ *™ fUnCtl '° n by number ° f c °™ 

F i qure 1 1 * Mean item information function by number of 
correct responses and grade. 

aMaJ2 ' ^?eio m nses. it9m in '° rma,i0n Mon b > numDe ' °< 

aBiI9ja ' SS&S&ST iWm in ' 0ma,i0n ,unc,i0 " ~ of 

S flur9 14 ' toln in,ormation 'unctions with cut scores 

represented. 
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